智能论文笔记

Multimodal Learning for Non-small Cell Lung Cancer Prognosis

Yujiao Wu , Yaxiong Wang , Xiaoshui Huang , Fan Yang , Sai Ho Ling , Steven Weidong Su

分类：计算机视觉 | 人工智能

2022-11-07

This paper focuses on the task of survival time analysis for lung cancer. Although much progress has been made in this problem in recent years, the performance of existing methods is still far from satisfactory. Traditional and some deep learning-based survival time analyses for lung cancer are mostly based on textual clinical information such as staging, age, histology, etc. Unlike existing methods that predicting on the single modality, we observe that a human clinician usually takes multimodal data such as text clinical data and visual scans to estimate survival time. Motivated by this, in this work, we contribute a smart cross-modality network for survival analysis network named Lite-ProSENet that simulates a human's manner of decision making. Extensive experiments were conducted using data from 422 NSCLC patients from The Cancer Imaging Archive (TCIA). The results show that our Lite-ProSENet outperforms favorably again all comparison methods and achieves the new state of the art with the 89.3% on concordance. The code will be made publicly available.

translated by 谷歌翻译

BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule

Miao Zhang , Jilin Hu , Steven Su , Shirui Pan , Xiaojun Chang , Bin Yang , Gholamreza Haffari

分类：机器学习

2021-11-25

近年来，可微弱的建筑搜索（飞镖）已经受到了大量的关注，主要是因为它通过重量分享和连续放松来显着降低计算成本。然而，更近期的作品发现现有的可分辨率NAS技术难以俯视幼稚基线，产生劣化架构作为搜索所需。本文通过将体系结构权重放入高斯分布，而不是直接优化架构参数，而不是直接优化架构参数，而是作为分布学习问题。通过利用自然梯度变分推理（NGVI），可以基于现有的码票来容易地优化架构分布而不会产生更多内存和计算消耗。我们展示了贝叶斯原则的可分解NAS如何益处，提高勘探和提高稳定性。 NAS-BENCH-201和NAS-BENCH-1SHOT1基准数据集的实验结果证实了所提出的框架可以制造的重要改进。此外，我们还在学习参数上只需简单地应用argmax，我们进一步利用了NAS中最近提出的无培训代理，从优化分布中汲取的组架构中选择最佳架构，从而实现最终的架构-ART在NAS-BENCH-201和NAS-BENCH-1SHOT1基准上的结果。我们在飞镖搜索空间中的最佳架构也会分别获得2.37 \％，15.72 \％和24.2 \％的竞争性测试错误，分别在Cifar-10，CiFar-100和Imagenet数据集上。

translated by 谷歌翻译

Differentiable Architecture Search Meets Network Pruning at Initialization: A More Reliable, Efficient, and Flexible Framework

Miao Zhang , Steven Su , Shirui Pan , Xiaojun Chang , Wei Huang , Bin Yang , Gholamreza Haffari

分类：机器学习 | 计算机视觉

2021-06-22

虽然可分辨率的架构搜索（飞镖）已成为神经结构中的主流范例（NAS），因为其简单和效率，最近的作品发现，搜索架构的性能几乎可以随着飞镖的优化程序而增加，以及最终的大小由飞镖获得几乎无法表明运营的重要性。上述观察表明，飞镖中的监督信号可能是架构搜索的穷人或不可靠的指标，鼓励有趣和有趣的方向：我们可以衡量不可分辨率范式下的任何培训的运作重要性吗？我们通过在初始化问题的网络修剪中定制NAS提供肯定的答案。随着最近建议的突触突触效力标准在初始化的网络修剪中，我们寻求在没有任何培训的情况下将候选人行动中的候选人行动的重要性进行评分，并提出了一种名为“免费可分辨的架构搜索}（Freedarts）的小说框架” 。我们表明，没有任何培训，具有不同代理度量的自由路由器可以在不同的搜索空间中优于大多数NAS基线。更重要的是，Freedarts是非常内存的高效和计算效率，因为它放弃了架构搜索阶段的培训，使得能够在更灵活的空间上执行架构搜索并消除架构搜索和评估之间的深度间隙。我们希望我们的工作激励从初始化修剪的角度来激发解决NAS的尝试。

translated by 谷歌翻译

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

Steven H. Wang , Antoine Scardigli , Leonard Tang , Wei Chen , Dimitry Levkin , Anya Chen , Spencer Ball , Thomas Woodside , Oliver Zhang , Dan Hendrycks

分类：自然语言处理

2023-01-02

Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.

translated by 谷歌翻译

Examining Political Rhetoric with Epistemic Stance Detection

Ankita Gupta , Su Lin Blodgett , Justin H Gross , Brendan O'Connor

分类：自然语言处理

2022-12-29

Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.

translated by 谷歌翻译

Detection of Active Emergency Vehicles using Per-Frame CNNs and Output Smoothing

Meng Fan , Craig Bidstrup , Zhaoen Su , Jason Owens , Gary Yang , Nemanja Djuric

分类：计算机视觉

2022-12-28

While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.

translated by 谷歌翻译

Social-Aware Clustered Federated Learning with Customized Privacy Preservation

Yuntao Wang , Zhou Su , Yanghe Pan , Tom H Luan , Ruidong Li , Shui Yu

分类：机器学习

2022-12-25

A key feature of federated learning (FL) is to preserve the data privacy of end users. However, there still exist potential privacy leakage in exchanging gradients under FL. As a result, recent research often explores the differential privacy (DP) approaches to add noises to the computing results to address privacy concerns with low overheads, which however degrade the model performance. In this paper, we strike the balance of data privacy and efficiency by utilizing the pervasive social connections between users. Specifically, we propose SCFL, a novel Social-aware Clustered Federated Learning scheme, where mutually trusted individuals can freely form a social cluster and aggregate their raw model updates (e.g., gradients) inside each cluster before uploading to the cloud for global aggregation. By mixing model updates in a social group, adversaries can only eavesdrop the social-layer combined results, but not the privacy of individuals. We unfold the design of SCFL in three steps. \emph{i) Stable social cluster formation. Considering users' heterogeneous training samples and data distributions, we formulate the optimal social cluster formation problem as a federation game and devise a fair revenue allocation mechanism to resist free-riders. ii) Differentiated trust-privacy mapping}. For the clusters with low mutual trust, we design a customizable privacy preservation mechanism to adaptively sanitize participants' model updates depending on social trust degrees. iii) Distributed convergence}. A distributed two-sided matching algorithm is devised to attain an optimized disjoint partition with Nash-stable convergence. Experiments on Facebook network and MNIST/CIFAR-10 datasets validate that our SCFL can effectively enhance learning utility, improve user payoff, and enforce customizable privacy protection.

translated by 谷歌翻译

Multilingual News Location Detection using an Entity-Based Siamese Network with Semi-Supervised Contrastive Learning and Knowledge Base

Víctor Suárez-Paniagua , Steven Derby , Tri Kurniawan Wijaya

分类：自然语言处理 | 人工智能

2022-12-22

Early detection of relevant locations in a piece of news is especially important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils. Additionally, this detection also helps recommender systems to promote relevant news based on user locations. Note that, when the relevant locations are not mentioned explicitly in the text, state-of-the-art methods typically fail to recognize them because these methods rely on syntactic recognition. In contrast, by incorporating a knowledge base and connecting entities with their locations, our system successfully infers the relevant locations even when they are not mentioned explicitly in the text. To evaluate the effectiveness of our approach, and due to the lack of datasets in this area, we also contribute to the research community with a gold-standard multilingual news-location dataset, NewsLOC. It contains the annotation of the relevant locations (and their WikiData IDs) of 600+ Wikinews articles in five different languages: English, French, German, Italian, and Spanish. Through experimental evaluations, we show that our proposed system outperforms the baselines and the fine-tuned version of the model using semi-supervised data that increases the classification rate. The source code and the NewsLOC dataset are publicly available for being used by the research community at https://github.com/vsuarezpaniagua/NewsLocation.

translated by 谷歌翻译

Deep set conditioned latent representations for action recognition

Akash Singh , Tom De Schepper , Kevin Mets , Peter Hellinckx , Jose Oramas , Steven Latre

分类：计算机视觉

2022-12-21

In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognising composite action of varying durations while other distinct composite or atomic actions occur in the background. Drawing upon the success of relational networks, we propose methods that learn to reason over the semantic concept of objects and actions. We empirically show how ANNs benefit from pretraining, relational inductive biases and unordered set-based latent representations. In this paper we propose deep set conditioned I3D (SCI3D), a two stream relational network that employs latent representation of state and visual representation for reasoning over events and actions. They learn to reason about temporally connected actions in order to identify all of them in the video. The proposed method achieves an improvement of around 1.49% mAP in atomic action recognition and 17.57% mAP in composite action recognition, over a I3D-NL baseline, on the CATER dataset.

translated by 谷歌翻译

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

Jiaxian Guo , Junnan Li , Dongxu Li , Anthony Meng Huat Tiong , Boyang Li , Dacheng Tao , Steven C. H. Hoi

分类：计算机视觉

2022-12-21

Large language models (LLMs) have demonstrated excellent zero-shot generalization to new language tasks. However, effective utilization of LLMs for zero-shot visual question-answering (VQA) remains challenging, primarily due to the modality disconnection and task disconnection between LLM and VQA task. End-to-end training on vision and language data may bridge the disconnections, but is inflexible and computationally expensive. To address this issue, we propose \emph{Img2Prompt}, a plug-and-play module that provides the prompts that can bridge the aforementioned modality and task disconnections, so that LLMs can perform zero-shot VQA tasks without end-to-end training. In order to provide such prompts, we further employ LLM-agnostic models to provide prompts that can describe image content and self-constructed question-answer pairs, which can effectively guide LLM to perform zero-shot VQA tasks. Img2Prompt offers the following benefits: 1) It can flexibly work with various LLMs to perform VQA. 2)~Without the needing of end-to-end training, it significantly reduces the cost of deploying LLM for zero-shot VQA tasks. 3) It achieves comparable or better performance than methods relying on end-to-end training. For example, we outperform Flamingo~\cite{Deepmind:Flamingo2022} by 5.6\% on VQAv2. On the challenging A-OKVQA dataset, our method even outperforms few-shot methods by as much as 20\%.

translated by 谷歌翻译